{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 🌐 Building an Intelligent Browser Agent with Llama 4\n",
    "\n",
    "This notebook provides a step-by-step guide to creating an AI-powered browser agent capable of navigating and interacting with websites autonomously. By combining the power of Llama 4 Scout, Playwright, and Together AI, this agent can perform tasks seamlessly while understanding both visual and textual content.\n",
    "\n",
    "##### Demo\n",
    "For a detailed explanation of the code and a demo video, visit our blog post: [**Blog Post and Demo Video**](https://miguelg719.github.io/browser-use-blog/)\n",
    "\n",
    "##### Features\n",
    "- Visual understanding of web pages through screenshots\n",
    "- Autonomous navigation and interaction\n",
    "- Natural language instructions for web tasks\n",
    "- Persistent browser session management\n",
    "\n",
    "For example, you can ask the agent to:\n",
    "- Search for a product on Amazon\n",
    "- Find the cheapest flight to Tokyo\n",
    "- Buy tickets for the next Warriors game\n",
    "\n",
    "\n",
    "##### What's in this Notebook?\n",
    "\n",
    "This recipe walks you through:\n",
    "- Setting up the environment and installing dependencies.\n",
    "- Automating browser interactions using Playwright.\n",
    "- Defining a structured prompt for the LLM to understand the task and execute the next action.\n",
    "- Leveraging Llama 4 Scout for content comprehension.\n",
    "- Creating a persistent and intelligent browser agent for real-world applications.\n",
    "\n",
    "***Please note that the agent is not perfect and may not always behave as expected.**\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Install Required Libraries\n",
    "This cell installs the necessary Python packages for the script, such as `together`, `playwright`, and `beautifulsoup4`.\n",
    "It also ensures that Playwright is properly installed to enable automated browser interactions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install together playwright\n",
    "!playwright install"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Import Modules and Set Up Environment\n",
    "Set your `Together` API key to instantiate the client client. Feel free to use a different provider if it's more convenient. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from dotenv import load_dotenv\n",
    "from together import Together\n",
    "\n",
    "load_dotenv()\n",
    "\n",
    "client = Together(api_key=os.getenv(\"TOGETHER_API_KEY\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Vision Query Example\n",
    "This function converts an image file into a Base64-encoded string, which is required for LLM querying.\n",
    "\n",
    "The next cell shows an example of how to use the `encode_image` function to convert an image file into a Base64-encoded string, which is then used in a chat completion request to the Llama 4 Scout model.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import base64\n",
    "from IPython.display import Markdown\n",
    "imagePath= \"sample_screenshot.png\"\n",
    "\n",
    "def encode_image(image_path):\n",
    "        with open(image_path, \"rb\") as image_file:\n",
    "            return base64.b64encode(image_file.read()).decode('utf-8')\n",
    "\n",
    "# Must have an image on the local path to use it\n",
    "base64_image = encode_image(imagePath)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "The image shows a screenshot of the Meta website, featuring an advertisement for Ray-Ban Meta glasses. The ad is prominently displayed on the left side of the page and includes the following elements:\n",
       "\n",
       "* A headline that reads \"Brilliant holidays, captured with Ray-Ban Meta glasses\"\n",
       "* A subheading that states \"Iconic style powered by Meta AI, now 20% off for a limited time.\"\n",
       "* Two call-to-action buttons: \"Shop all\" and \"Learn more\"\n",
       "\n",
       "On the right side of the page, there is a large image of a pair of black sunglasses with purple lenses, accompanied by the text \"Hey Meta\" and a logo.\n",
       "\n",
       "In the top navigation bar, several menu items are visible, including:\n",
       "\n",
       "* Meta Quest\n",
       "* Ray-Ban Meta\n",
       "* Apps and games\n",
       "* Gift guide\n",
       "* About Meta\n",
       "* Support\n",
       "\n",
       "Overall, the image appears to be promoting a limited-time offer on Ray-Ban Meta glasses, which are equipped with Meta AI technology."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "response = client.chat.completions.create(\n",
    "    model=\"meta-llama/Llama-4-Scout-17B-16E-Instruct\",\n",
    "    messages=[\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\"type\": \"text\", \"text\": \"what is this image about?\"},\n",
    "                {\n",
    "                    \"type\": \"image_url\",\n",
    "                    # Uses a local image path. To use a remote image, replace the url with the image URL.\n",
    "                    \"image_url\": {\n",
    "                        \"url\": f\"data:image/jpeg;base64,{base64_image}\",\n",
    "                    }\n",
    "                },\n",
    "            ],\n",
    "        }\n",
    "    ]\n",
    ")\n",
    "\n",
    "display(Markdown(response.choices[0].message.content))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Helper Functions to Parse the Accessibility Tree\n",
    "\n",
    "The agent will use the accessibility tree to understand the elements on the page and interact with them. A helper function is defined here to help simplity the accessibility tree for the agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def parse_accessibility_tree(node, indent=0):\n",
    "    \"\"\"\n",
    "    Recursively parses the accessibility tree and prints a readable structure.\n",
    "    Args:\n",
    "        node (dict): A node in the accessibility tree.\n",
    "        indent (int): Indentation level for the nested structure.\n",
    "    \"\"\"\n",
    "    # Initialize res as an empty string at the start of each parse\n",
    "    res = \"\"\n",
    "    \n",
    "    def _parse_node(node, indent, res):\n",
    "        # Base case: If the node is empty or doesn't have a 'role', skip it\n",
    "        if not node or 'role' not in node:\n",
    "            return res\n",
    "\n",
    "        # Indentation for nested levels\n",
    "        indented_space = \" \" * indent\n",
    "        \n",
    "        # Add node's name and role to result string\n",
    "        if 'value' in node:\n",
    "            res = res + f\"{indented_space}Role: {node['role']} - Name: {node.get('name', 'No name')} - Value: {node['value']}\\n\"\n",
    "        else:\n",
    "            res = res + f\"{indented_space}Role: {node['role']} - Name: {node.get('name', 'No name')}\\n\"\n",
    "        \n",
    "        # If the node has children, recursively parse them\n",
    "        if 'children' in node:\n",
    "            for child in node['children']:\n",
    "                res = _parse_node(child, indent + 2, res)  # Increase indentation for child nodes\n",
    "                \n",
    "        return res\n",
    "\n",
    "    return _parse_node(node, indent, res)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Define Prompts\n",
    "a) **Planning Prompt:**\n",
    "Create a structured prompt for the LLM to understand the task and execute the next action.\n",
    "\n",
    "b) **Agent Execution Prompt**\n",
    "A structured prompt is created, specifying the instructions for processing the webpage content and screenshots."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "planning_prompt = \"\"\"\n",
    "Given a user request, define a very simple plan of subtasks (actions) to achieve the desired outcome and execute them iteratively using Playwright.\n",
    "\n",
    "1. Understand the Task:\n",
    "   - Interpret the user's request and identify the core goal.\n",
    "   - Break down the task into a few smaller, actionable subtasks to achieve the goal effectively.\n",
    "\n",
    "2. Planning Actions:\n",
    "   - Translate the user's request into a high-level plan of actions.\n",
    "   - Example actions include:\n",
    "     - Searching for specific information.\n",
    "     - Navigating to specified URLs.\n",
    "     - Interacting with website elements (clicking, filling).\n",
    "     - Extracting or validating data.\n",
    "\n",
    "Input:\n",
    "- User Request (Task)\n",
    "\n",
    "Output from the Agent:\n",
    "- Step-by-Step Action Plan:: Return only an ordered list of actions. Only return the list, no other text.\n",
    "\n",
    "**Example User Requests and Agent Behavior:**\n",
    "\n",
    "1. **Input:** \"Search for a product on Amazon.\"\n",
    "   - **Output:**\n",
    "     1. Navigate to Amazon's homepage.\n",
    "     2. Enter the product name in the search bar and perform the search.\n",
    "     3. Extract and display the top results, including the product title, price, and ratings.\n",
    "\n",
    "2. **Input:** \"Find the cheapest flight to Tokyo.\"\n",
    "   - **Output:**\n",
    "     1. Visit a flight aggregator website (e.g. google flights).\n",
    "     2. Enter the departure city.\n",
    "     3. Enter the destination city\n",
    "     4. Enter the departure date\n",
    "     5. Enter the return date\n",
    "     6. Click the 'Done' button to confirm departure and return dates.\n",
    "     6. Click the 'Search' button to find available flights.\n",
    "     7. Extract and compare the flight options, highlighting the cheapest option.\n",
    "\n",
    "3. **Input:** \"Buy tickets for the next Warriors game.\"\n",
    "   - **Output:**\n",
    "     1. Navigate to a ticket-selling platform (e.g., Ticketmaster).\n",
    "     2. Fill the search bar with the team name.\n",
    "     2. Search for upcoming team games.\n",
    "     3. Select the next available game and purchase tickets for the specified quantity.\n",
    "\n",
    "\"\"\"\n",
    "\n",
    "\n",
    "execution_prompt = \"\"\"\n",
    "You will be given a task, a website's page accessibility tree, and the page screenshot as context. The screenshot is where you are now, use it to understand the accessibility tree. Based on that information, you need to decide the next step action. ONLY RETURN THE NEXT STEP ACTION IN A SINGLE JSON.\n",
    "\n",
    "When selecting elements, use elements from the accessibility tree.\n",
    "\n",
    "Reflect on what you are seeing in the accessibility tree and the screenshot and decide the next step action, elaborate on it in reasoning, and choose the next appropriate action.\n",
    "\n",
    "Selectors must follow the format:\n",
    "- For a button with a specific name: \"button=ButtonName\"\n",
    "- For a placeholder (e.g., input field): \"placeholder=PlaceholderText\"\n",
    "- For text: \"text=VisibleText\"\n",
    "\n",
    "Make sure to analyze the accessibility tree and the screenshot to understand the current state, if something is not clear, you can use the previous actions to understand the current state. Explain why you are in the current state in current_state.\n",
    "\n",
    "You will be given a task and you MUST return the next step action in JSON format:\n",
    "{\n",
    "    \"current_state\": \"Where are you now? Analyze the accessibility tree and the screenshot to understand the current state.\",\n",
    "    \"reasoning\": \"What is the next step to accomplish the task?\",\n",
    "    \"action\": \"navigation\" or \"click\" or \"fill\" or \"finished\",\n",
    "    \"url\": \"https://www.example.com\", // Only for navigation actions\n",
    "    \"selector\": \"button=Click me\", // For click or fill actions, derived from the accessibility tree\n",
    "    \"value\": \"Input text\", // Only for fill actions\n",
    "}\n",
    "\n",
    "### Guidelines:\n",
    "1. Use **\"navigation\"** for navigating to a new website through a URL.\n",
    "2. Use **\"click\"** for interacting with clickable elements. Examples:\n",
    "   - Buttons: \"button=Click me\"\n",
    "   - Text: \"text=VisibleText\"\n",
    "   - Placeholders: \"placeholder=Search...\"\n",
    "   - Link: \"link=BUY NOW\"\n",
    "3. Use **\"fill\"** for inputting text into editable fields. Examples:\n",
    "   - Placeholder: \"placeholder=Search...\"\n",
    "   - Textbox: \"textbox=Flight destination output\"\n",
    "   - Input: \"input=Search...\"\n",
    "4. Use **\"finished\"** when the task is done. For example:\n",
    "   - If a task is successfully completed.\n",
    "   - If navigation confirms you are on the correct page.\n",
    "\n",
    "\n",
    "### Accessibility Tree Examples:\n",
    "\n",
    "You will be given an accessibility tree to interact with the webpage. It consists of a nested node structure that represents elements on the page. For example:\n",
    "\n",
    "Role: generic - Name: \n",
    "   Role: text - Name: Phoenix\n",
    "   Role: button - Name: \n",
    "   Role: listitem - Name: \n",
    "   Role: textbox - Name: Where from?\n",
    "Role: button - Name: Swap where to and where from\n",
    "Role: generic - Name: \n",
    "   Role: textbox - Name: Where to?\n",
    "Role: textbox - Name: Return\n",
    "Role: button - Name: \n",
    "Role: button - Name: \n",
    "Role: textbox - Name: Departure\n",
    "Role: button - Name: \n",
    "Role: button - Name: Done\n",
    "Role: button - Name: Search\n",
    "\n",
    "This section indicates that there is a textbox with a name \"Where to?\" filled with Phoenix. There is also a button with the name \"Swap where to and where from\". Another textbox with the name \"where to?\" not filled with any text. There are also textboxes with the names \"Departure\", \"Return\", which are not filled with any dates, and a buttons named \"Done\" and \"Search\".\n",
    "\n",
    "Retry actions at most 2 times before trying a different action.\n",
    "\n",
    "### Examples:\n",
    "1. To click on a button labeled \"Search\":\n",
    "   {\n",
    "       \"current_state\": \"On the homepage of a search engine.\",\n",
    "       \"reasoning\": \"The accessibility tree shows a button named 'Search'. Clicking it is the appropriate next step to proceed with the task.\",\n",
    "       \"action\": \"click\",\n",
    "       \"selector\": \"button=Search\"\n",
    "   }\n",
    "\n",
    "2. To fill a search bar with the text \"AI tools\":\n",
    "   {\n",
    "       \"current_state\": \"On the search page with a focused search bar.\",\n",
    "       \"reasoning\": \"The accessibility tree shows an input field with placeholder 'Search...'. Entering the query 'AI tools' fulfills the next step of the task.\",\n",
    "       \"action\": \"fill\",\n",
    "       \"selector\": \"placeholder=Search...\",\n",
    "       \"value\": \"AI tools\"\n",
    "   }\n",
    "\n",
    "3. To navigate to a specific URL:\n",
    "   {\n",
    "       \"current_state\": \"Starting from a blank page.\",\n",
    "       \"reasoning\": \"The task requires visiting a specific website to gather relevant information. Navigating to the URL is the first step.\",\n",
    "       \"action\": \"navigation\",\n",
    "       \"url\": \"https://example.com\"\n",
    "   }\n",
    "\n",
    "4. To finish the task:\n",
    "   {\n",
    "       \"current_state\": \"Completed the search and extracted the necessary data.\",\n",
    "       \"reasoning\": \"The task goal has been achieved, and no further actions are required.\",\n",
    "       \"action\": \"finished\"\n",
    "   }\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Few Shot Examples\n",
    "\n",
    "Performance improves drastically by adding a few shot examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "few_shot_example_1 = \"\"\"\n",
    "User Input: \"What are the best tacos in San Francisco?\"\n",
    "\n",
    "Agent Step Sequence:\n",
    "Step 1: \n",
    "{\n",
    "    \"current_state\": \"On a blank page.\",\n",
    "    \"reasoning\": \"The task is to find the best tacos in San Francisco, so the first step is to navigate to Google to perform a search.\",\n",
    "    \"action\": \"navigation\",\n",
    "    \"url\": \"https://www.google.com\",\n",
    "}\n",
    "\n",
    "Step 2: \n",
    "{\n",
    "    \"current_state\": \"On the Google homepage.\",\n",
    "    \"reasoning\": \"To search for the best tacos in San Francisco, I need to fill the Google search bar with the query.\",\n",
    "    \"action\": \"fill\",\n",
    "    \"selector\": \"combobox=Search\",\n",
    "    \"value\": \"Best tacos in San Francisco\"\n",
    "}\n",
    "\n",
    "Step 3:\n",
    "{\n",
    "    \"current_state\": \"On Google search results page.\",\n",
    "    \"reasoning\": \"After entering the query, I need to click the search button to retrieve the results.\",\n",
    "    \"action\": \"click\",\n",
    "    \"selector\": \"button=Google Search\"\n",
    "}\n",
    "\n",
    "Step 4: \n",
    "{\n",
    "    \"current_state\": \"On the search results page with multiple links.\",\n",
    "    \"reasoning\": \"From the search results, I need to click on a reliable food-review or blogwebsite link.\",\n",
    "    \"action\": \"click\",\n",
    "    \"selector\": \"text=Yelp\"\n",
    "}\n",
    "\n",
    "Step 5:\n",
    "{\n",
    "    \"current_state\": \"On Yelp's best taqueria near San Francisco page.\",\n",
    "    \"reasoning\": \"The task is complete as I have found the top taquerias in San Francisco.\",\n",
    "    \"action\": \"finished\",\n",
    "    \"summary\": \"I have successfully found the best tacos in San Francisco.\"\n",
    "}\n",
    "\"\"\"\n",
    "\n",
    "few_shot_example_2 = \"\"\"\n",
    "User Input: Can you send an email to reschedule a meeting for Dmitry at gmail.com for tomorrow morning? I'm sick today.\n",
    "\n",
    "Agent Step Sequence:\n",
    "Step 1:\n",
    "{\n",
    "    \"current_state\": \"On a blank page.\",\n",
    "    \"reasoning\": \"To send an email, the first step is to navigate to Gmail.\",\n",
    "    \"action\": \"navigation\",\n",
    "    \"url\": \"https://mail.google.com\",\n",
    "}\n",
    "\n",
    "Step 2:\n",
    "{\n",
    "    \"current_state\": \"On Gmail's homepage.\",\n",
    "    \"reasoning\": \"Click the 'Compose' button to start drafting a new email.\",\n",
    "    \"action\": \"click\",\n",
    "    \"selector\": \"button=Compose\"\n",
    "}\n",
    "\n",
    "Step 3:\n",
    "{\n",
    "    \"current_state\": \"In the new email draft window.\",\n",
    "    \"reasoning\": \"Enter Dmitry's email address in the recipient field.\",\n",
    "    \"action\": \"fill\",\n",
    "    \"selector\": \"placeholder=Recipients\",\n",
    "    \"value\": \"dmitry@gmail.com\"\n",
    "}\n",
    "\n",
    "Step 4: \n",
    "{\n",
    "    \"current_state\": \"In the new email draft with the recipient filled.\",\n",
    "    \"reasoning\": \"Set the subject line to indicate the purpose of the email.\",\n",
    "    \"action\": \"fill\",\n",
    "    \"selector\": \"placeholder=Subject\",\n",
    "    \"value\": \"Rescheduling Meeting\"\n",
    "}\n",
    "\n",
    "Step 5:\n",
    "{\n",
    "    \"current_state\": \"In the new email draft with the subject set.\",\n",
    "    \"reasoning\": \"Compose the email body to politely inform Dmitry about rescheduling the meeting.\",\n",
    "    \"action\": \"fill\",\n",
    "    \"selector\": \"placeholder=Email body\",\n",
    "    \"value\": \"Hi Dmitry,\\\\n\\\\nI'm feeling unwell today and would like to reschedule our meeting for tomorrow morning. Please let me know if this works for you.\\\\n\\\\nBest regards,\\\\n[Your Name]\"\n",
    "}\n",
    "\n",
    "Step 6: \n",
    "{\n",
    "    \"current_state\": \"In the new email draft with the body composed.\",\n",
    "    \"reasoning\": \"Click the 'Send' button to deliver the email to Dmitry.\",\n",
    "    \"action\": \"click\",\n",
    "    \"selector\": \"button=Send\"\n",
    "}\n",
    "\n",
    "Step 7:\n",
    "{\n",
    "    \"current_state\": \"On Gmail's homepage after sending the email.\",\n",
    "    \"reasoning\": \"The email has been drafted and sent, fulfilling the task of informing Dmitry about the reschedule.\",\n",
    "    \"action\": \"finished\",\n",
    "    \"summary\": \"Email sent to Dmitry to reschedule the meeting for tomorrow morning.\"\n",
    "}\n",
    "\"\"\"\n",
    "\n",
    "few_shot_example_3 = \"\"\"\n",
    "User Input: \"Find the round trip cheapest flight to Madrid.\"\n",
    "\n",
    "Agent Step Sequence:\n",
    "\n",
    "Step 1: \n",
    "{\n",
    "    \"current_state\": \"On a flight booking website.\",\n",
    "    \"reasoning\": \"The task is to find the cheapest round trip flight to Madrid, so the first step is to navigate to a flight aggregator website.\",\n",
    "    \"action\": \"navigation\",\n",
    "    \"url\": \"https://www.example-flight-aggregator.com\",\n",
    "}\n",
    "\n",
    "Step 2: \n",
    "{\n",
    "    \"current_state\": \"On the flight aggregator homepage.\",\n",
    "    \"reasoning\": \"To find flights, I need to fill the departure city field.\",\n",
    "    \"action\": \"fill\",\n",
    "    \"selector\": \"placeholder=Departure City\",\n",
    "    \"value\": \"Your City\"\n",
    "}\n",
    "\n",
    "Step 3:\n",
    "{\n",
    "    \"current_state\": \"On the flight search page with departure city filled.\",\n",
    "    \"reasoning\": \"Fill the destination city field with 'Madrid'.\",\n",
    "    \"action\": \"fill\",\n",
    "    \"selector\": \"placeholder=Where to?\",\n",
    "    \"value\": \"Madrid\"\n",
    "}\n",
    "\n",
    "Step 4:\n",
    "{\n",
    "    \"current_state\": \"On the flight search page with destination city filled.\",\n",
    "    \"reasoning\": \"Fill the Departure field with a date in future\",\n",
    "    \"action\": \"fill\",\n",
    "    \"selector\": \"placeholder=outband_date\",  \n",
    "    \"value\": \"2025-10-15\",\n",
    "\n",
    "Step 5:\n",
    "{\n",
    "    \"current_state\": \"Departure date filled.\",\n",
    "    \"reasoning\": \"Fill the return field with a date in future after the departure date\",\n",
    "    \"action\": \"fill\",\n",
    "    \"selector\": \"placeholder=return_date\",  \n",
    "    \"value\": '2025-12-08',\n",
    "}\n",
    "\n",
    "Step 6:\n",
    "{\n",
    "    \"current_state\": \"Return date filled.\",\n",
    "    \"reasoning\": \"Click the 'Done' button.\",\n",
    "    \"action\": \"click\",\n",
    "    \"selector\": \"button=Done\"\n",
    "}\n",
    "\n",
    "Step 7:\n",
    "{\n",
    "    \"current_state\": \"Done button clicked.\",\n",
    "    \"reasoning\": \"Click the 'Search' button to search the flights.\",\n",
    "    \"action\": \"click\",\n",
    "    \"selector\": \"button=Search\"\n",
    "}\n",
    "\n",
    "Step 8:\n",
    "{\n",
    "    \"current_state\": \"Flight options are displayed.\",\n",
    "    \"reasoning\": \"Extract and compare the flight options, highlighting the cheapest option.\",\n",
    "    \"action\": \"finished\",\n",
    "    \"summary\": \"Cheapest round trip flight to Madrid found and displayed.\"\n",
    "}\n",
    "\"\"\"\n",
    "\n",
    "few_shot_examples = [few_shot_example_1, few_shot_example_2, few_shot_example_3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Define a task and generate a plan of actions to execute\n",
    "\n",
    "You can define your own task or use one of the examples below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define your task here:\n",
    "#task = 'Find toys to buy for my 10 year old niece this Christmas'\n",
    "#task = 'Find tickets for the next Warriors game'\n",
    "task = 'Find the cheapest round trip flight to Istanbul'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate a plan of actions to execute\n",
    "\n",
    "The next cell queries the LLM using the planning prompt to generate a plan of actions to execute. This then becomes each of the individual subtasks for the execution agent to complete."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generating plan...\n",
      "1. Navigate to Google Flights.\n",
      "2. Enter the departure city.\n",
      "3. Enter \"Istanbul\" as the destination city.\n",
      "4. Select round-trip option.\n",
      "5. Enter the departure date (current date or next available date).\n",
      "6. Enter the return date (one week after the departure date).\n",
      "7. Click the 'Search' button to find available flights.\n",
      "8. Extract and compare the flight options, highlighting the cheapest option.\n"
     ]
    }
   ],
   "source": [
    "print(\"Generating plan...\")\n",
    "planning_response = client.chat.completions.create(\n",
    "    model=\"meta-llama/Llama-4-Scout-17B-16E-Instruct\",\n",
    "    temperature=0.0,\n",
    "    messages=[\n",
    "        {\"role\": \"system\", \"content\": planning_prompt},\n",
    "        {\"role\": \"user\", \"content\": task},\n",
    "    ],\n",
    ")     \n",
    "plan = planning_response.choices[0].message.content\n",
    "print(plan)\n",
    "steps = [line.strip()[3:] for line in plan.strip().split('\\n')]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Create the Browser environment and Run the Agent\n",
    "The necessary modules for web scraping are imported, and the setup for using Playwright asynchronously is initialized.\n",
    "\n",
    "The context is provided to the LLM to help it understand its current state and generate the next required action to complete the provided task. \n",
    "\n",
    "- At any step, you can press **enter** to continue or **'q'** to quit the agent loop. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Agent response: {\n",
      "    \"current_state\": \"On the Google homepage.\",\n",
      "    \"reasoning\": \"The task is to find the cheapest round trip flight to Istanbul. The first step is to navigate to Google Flights.\",\n",
      "    \"action\": \"navigation\",\n",
      "    \"url\": \"https://www.google.com/flights\"\n",
      "}\n"
     ]
    },
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Press 'q' to quit or Enter to continue:  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Agent response: {\n",
      "    \"current_state\": \"On Google Flights homepage.\",\n",
      "    \"reasoning\": \"The task is to find the cheapest round trip flight to Istanbul. The next step is to enter Istanbul as the destination city.\",\n",
      "    \"action\": \"fill\",\n",
      "    \"selector\": \"combobox=Where to?\",\n",
      "    \"value\": \"Istanbul\"\n",
      "}\n"
     ]
    },
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Press 'q' to quit or Enter to continue:  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Agent response: {\n",
      "    \"current_state\": \"On Google Flights with destination city filled as Istanbul.\",\n",
      "    \"reasoning\": \"The next step is to fill in the departure date to proceed with the flight search.\",\n",
      "    \"action\": \"fill\",\n",
      "    \"selector\": \"textbox=Departure\",\n",
      "    \"value\": \"2025-10-15\"\n",
      "}\n"
     ]
    },
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Press 'q' to quit or Enter to continue:  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Agent response: {\n",
      "    \"current_state\": \"On Google Flights with departure city set to Phoenix, destination city set to Istanbul, and departure date set to October 15, 2025.\",\n",
      "    \"reasoning\": \"The next step is to fill in the return date field to complete the search criteria for a round-trip flight.\",\n",
      "    \"action\": \"fill\",\n",
      "    \"selector\": \"textbox=Return\",\n",
      "    \"value\": \"2025-10-22\"\n",
      "}\n"
     ]
    },
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Press 'q' to quit or Enter to continue:  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Agent response: {\n",
      "    \"current_state\": \"On Google Flights with departure city set to Phoenix, destination city set to Istanbul, departure date set to October 15, 2025, and return date set to October 22, 2025.\",\n",
      "    \"reasoning\": \"The next step is to click the 'Search' button to find available flights based on the specified criteria.\",\n",
      "    \"action\": \"click\",\n",
      "    \"selector\": \"button=Search\"\n",
      "}\n"
     ]
    },
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Press 'q' to quit or Enter to continue:  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Agent response: {\n",
      "    \"current_state\": \"On Google Flights search results page for Phoenix to Istanbul.\",\n",
      "    \"reasoning\": \"The page has loaded with multiple flight options. I need to identify and select the cheapest round-trip flight option to fulfill the task.\",\n",
      "    \"action\": \"click\",\n",
      "    \"selector\": \"text=From 1098 US dollars round trip total\",\n",
      "    \"url\": \"https://www.google.com/flights\"\n",
      "}\n"
     ]
    },
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Press 'q' to quit or Enter to continue:  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Agent response: {\n",
      "    \"current_state\": \"On Google Flights search results page for Phoenix to Istanbul.\",\n",
      "    \"reasoning\": \"The page has loaded with multiple flight options. I need to sort the results to find the cheapest option.\",\n",
      "    \"action\": \"click\",\n",
      "    \"selector\": \"tab=Cheapest\"\n",
      "}\n"
     ]
    },
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Press 'q' to quit or Enter to continue:  \n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Agent response: {\n",
      "    \"current_state\": \"On Google Flights search results page with multiple flight options displayed.\",\n",
      "    \"reasoning\": \"The accessibility tree shows multiple flight options with prices. The cheapest option is listed as $1098. To proceed with the task, I need to select the cheapest flight option.\",\n",
      "    \"action\": \"click\",\n",
      "    \"selector\": \"link=From 1098 US dollars round trip total. 2 stops flight with United and Lufthansa. Operated by SkyWest DBA United Express. Leaves Phoenix Sky Harbor International Airport at 10:41 AM on Wednesday, October 15 and arrives at Istanbul Airport at 9:20 PM on Thursday, October 16. Total duration 24 hr 39 min. Layover (1 of 2) is a 2 hr 33 min layover at Los Angeles International Airport in Los Angeles. Layover (2 of 2) is a 6 hr 30 min layover at Frankfurt Airport in Frankfurt am Main. Select flight\",\n",
      "    \"value\": \"\"\n",
      "}\n"
     ]
    },
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Press 'q' to quit or Enter to continue:  q\n"
     ]
    }
   ],
   "source": [
    "from playwright.async_api import async_playwright\n",
    "import asyncio \n",
    "import json\n",
    "import re\n",
    "\n",
    "previous_context = None\n",
    "\n",
    "async def run_browser():\n",
    "    async with async_playwright() as playwright:\n",
    "        # Launch Chromium browser\n",
    "        browser = await playwright.chromium.launch(headless=False, channel=\"chrome\")\n",
    "        page = await browser.new_page()\n",
    "        await asyncio.sleep(1)\n",
    "        await page.goto(\"https://google.com/\")\n",
    "        previous_actions = []\n",
    "        try:\n",
    "            while True:  # Infinite loop to keep session alive, press enter to continue or 'q' to quit\n",
    "                # Get Context from page\n",
    "                accessibility_tree = await page.accessibility.snapshot()\n",
    "                accessibility_tree = parse_accessibility_tree(accessibility_tree)\n",
    "                await page.screenshot(path=\"screenshot.png\")\n",
    "                base64_image = encode_image(imagePath)\n",
    "                previous_context = accessibility_tree\n",
    "                response = client.chat.completions.create(\n",
    "                    model=\"meta-llama/Llama-4-Scout-17B-16E-Instruct\",\n",
    "                    temperature=0.0,\n",
    "                    messages=[\n",
    "                        {\"role\": \"system\", \"content\": execution_prompt},\n",
    "                        {\"role\": \"system\", \"content\": f\"Few shot examples: {few_shot_examples}. Just a few examples, user will assign you VERY range set of tasks.\"},\n",
    "                        {\"role\": \"system\", \"content\": f\"Plan to execute: {steps}\\n\\n Accessibility Tree: {previous_context}\\n\\n, previous actions: {previous_actions}\"},\n",
    "                        {\"role\": \"user\", \"content\": \n",
    "                         [\n",
    "                            {\n",
    "                                \"type\": \"text\",\n",
    "                                \"text\": f'What should be the next action to accomplish the task: {task} based on the current state? Remember to review the plan and select the next action based on the current state. Provide the next action in JSON format strictly as specified above.',\n",
    "                            },\n",
    "                            {\n",
    "                                \"type\": \"image_url\",\n",
    "                                \"image_url\": {\n",
    "                                    \"url\": f\"data:image/jpeg;base64,{base64_image}\",\n",
    "                                }\n",
    "                            },\n",
    "                         ]\n",
    "                        }\n",
    "                    ],\n",
    "                )\n",
    "                res = response.choices[0].message.content\n",
    "                ## to remove invisible characters, whitespaces and commas:\n",
    "                # Remove any trailing commas\n",
    "                res = res.rstrip(',')\n",
    "                # Remove any invisible characters\n",
    "                res = ''.join(c for c in res if ord(c) >= 32 or ord(c) == 10 or ord(c) == 13)\n",
    "                print('Agent response:', res)\n",
    "                try:\n",
    "                    match = re.search(r'\\{.*\\}', res, re.DOTALL)\n",
    "                    if match:\n",
    "                        output = json.loads(match.group(0))\n",
    "                except Exception as e:\n",
    "                    print('Error parsing JSON:', e)\n",
    "\n",
    "                if output[\"action\"] == \"navigation\":\n",
    "                    try:\n",
    "                        await page.goto(output[\"url\"])\n",
    "                        previous_actions.append(f\"navigated to {output['url']}, SUCCESS\")\n",
    "                    except Exception as e:\n",
    "                        previous_actions.append(f\"Error navigating to {output['url']}: {e}\")\n",
    "\n",
    "                elif output[\"action\"] == \"click\":\n",
    "                    try:\n",
    "                        selector_type, selector_name = output[\"selector\"].split(\"=\")[0], output[\"selector\"].split(\"=\")[1]\n",
    "                        res = await page.get_by_role(selector_type, name=selector_name).first.click()\n",
    "                        previous_actions.append(f\"clicked {output['selector']}, SUCCESS\")\n",
    "                    except Exception as e:\n",
    "                        previous_actions.append(f\"Error clicking on {output['selector']}: {e}\")\n",
    "\n",
    "                elif output[\"action\"] == \"fill\" and output[\"selector\"] == \"textbox=outband_date\":\n",
    "                    try:\n",
    "                        # Simulate a click to open the date picker if necessary\n",
    "                        await page.click('button=outband_date')\n",
    "                        await fill_date(page, 'input[name=\"outband_date\"]', output[\"value\"])\n",
    "                        previous_actions.append(f\"filled Departure date field with {output['value']}, SUCCESS\")\n",
    "                    except Exception as e:\n",
    "                        previous_actions.append(f\"Error filling Departure date field with {output['value']}: {e}\")\n",
    "                elif output[\"action\"] == \"fill\" and output[\"selector\"] == \"textbox=return_date\":\n",
    "                    try:\n",
    "                        # Simulate a click to open the date picker if necessary\n",
    "                        await page.click('button=return_date')\n",
    "                        await fill_date(page, 'input[name=\"return_date\"]', output[\"value\"])\n",
    "                        previous_actions.append(f\"filled Return date field with {output['value']}, SUCCESS\")\n",
    "                    except Exception as e:\n",
    "                        previous_actions.append(f\"Error filling Return date field with {output['value']}: {e}\")\n",
    "    \n",
    "                elif output[\"action\"] == \"fill\":\n",
    "                    try:\n",
    "                        selector_type, selector_name = output[\"selector\"].split(\"=\")[0], output[\"selector\"].split(\"=\")[1]\n",
    "                        res = await page.get_by_role(selector_type, name=selector_name).fill(output[\"value\"])\n",
    "                        await asyncio.sleep(1)\n",
    "                        await page.keyboard.press(\"Enter\")\n",
    "                        previous_actions.append(f\"filled {output['selector']} with {output['value']}, SUCCESS\")\n",
    "                    except Exception as e:\n",
    "                            previous_actions.append(f\"Error filling {output['selector']} with {output['value']}: {e}\")\n",
    "\n",
    "                elif output[\"action\"] == \"finished\":\n",
    "                    print(output[\"summary\"])\n",
    "                    break\n",
    "\n",
    "                await asyncio.sleep(1) \n",
    "                \n",
    "                # Or wait for user input\n",
    "                user_input = input(\"Press 'q' to quit or Enter to continue: \")\n",
    "                if user_input.lower() == 'q':\n",
    "                    break\n",
    "                \n",
    "        except Exception as e:\n",
    "            print(f\"An error occurred: {e}\")\n",
    "        finally:\n",
    "            # Only close the browser when explicitly requested\n",
    "            await browser.close()\n",
    "\n",
    "# Run the async function\n",
    "await run_browser()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## And that's it! Congratulations! 🎉🎉\n",
    "\n",
    "You've just created a browser agent that can navigate websites, understand page content through vision, plan and execute actions based on natural language commands, and maintain context across multiple interactions.\n",
    "\n",
    "\n",
    "**Collaborators**\n",
    "\n",
    "Feel free to reach out with any questions or feedback!\n",
    "\n",
    "\n",
    "**Miguel Gonzalez** on [X](https://x.com/miguel_gonzf) or [LinkedIn](https://www.linkedin.com/in/gonzalezfernandezmiguel/)\n",
    "\n",
    "**Dimitry Khorzov** on [X](https://x.com/korzhov_dm) or [LinkedIn](https://www.linkedin.com/in/korzhovdm)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "browser-use",
   "language": "python",
   "name": "browser-use"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
